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^"-^ Abstract. The structure of many real- world optimization problems includes minimization of 

' _ ' a nonlinear (or quadratic) functional subject to bound and singly linear constraints (in the form of 

(_^ either equality or bilateral inequality) which are commonly called as continuous knapsack problems. 

Ch Since there are efficient methods to solve large-scale bound constrained nonlinear programs, it is 

K^^ desirable to adapt these methods to solve knapsack problems, while preserving their efficiency and 

^ convergence theories. The goal of this paper is to introduce a general framework to extend a box- 

^^ constrained optimization solver to solve knapsack problems. This framework includes two main 

^^ ingredients which are 0(n) methods; in terms of the computational cost and required memory; for 

the projection onto the knapsack constrains and the null-space manipulation of the related linear 

00 constraint. The main focus of this work is on the extension of Hager-Zhang active set algorithm 
^~H (SfAM J. Optim. 2006, pp. 526—557). The main reasons for this choice was its promising efficiency 

in practice as well as its excellent convergence theories (e.g., superlinear local convergence rate 

1 I without strict complementarity assumption). Moreover, this method does not use any explicit form 
L_ , J of second order information and/or solution of linear systems in the course of optimization which 

makes it an ideal choice for large-scale problems. Moreover, application of Birgin and Martinez 
active set algorithm (Comp. Opt. Appl. 2002, pp. 101-125) for knapsack problems is also briefly 
r*^ commented. The feasibility of the presented algorithm is supported by numerical results for topology 

optimization problems. 
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1. Introduction. The goal of this paper is to develop efficient methods to solve 



*y^ large-scale linearly constrained optimization problems with the following structure 

J^ min/(x) s.t. x G 2? (1.1) 



the feasible set T) is equal to either of Vg or Vi which are defined as follows 

Vs =^ {x e 6 : a^x = 6 } (1.2) 

Vi =^ {x e S : 5, < a^x sC 6„} (1.3) 



"^ B'^^^ {xeR" : li^x^u} (1.4) 

where / is a real-valued continuously differentiable function defined on V, a, 1, u S 
K", b,bi,bu S M and 1 ^ u. In the related literature, set V is usually called as 
the continuous knapsack constraints. Although problem (1.1) can be studied under 
context of general linearly constrained optimization problems, due to the importance 
and wide range of applications, there are a lot of works specifically conducted to solve 
problems with (1.1) sprecture, e.g. see: [27, 26, 11, 9, 25, 23, 29, 16]. Optimization 
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problems like (1.1) occur frequently in the filed of operational researchs like optimal 
resource allocation and marketing, refer to [7, 28] as some topical reviews. 

In addition to the mentioned applications, finite-dimensional counterpart of some 
constrained infinite-dimensional optimization or control problems leads to a problem 
with structure like (1.1). Topology optimization problems with resource constraint 
can be accounted in this context. Note that, in this case we have also a PDE as an 
equality constraints. However it is easy to remove these constraints by performing 
the optimization on the reduced space spanned by the state PDE. 

Although in the past two decades, several methods with good convergence theories 
have been introduced to solve linearly constrained optimization problems, develop- 
ing an efficient method to solve large scale problems (in terms of computational cost 
and memory usage) is still remained as an open problem. An 0(n) growth of the 
memory usage and computational cost per iteration as well as the global convergence 
and quadratic or superlinear local convergence seems to be desirable properties of an 
efficient method. To realize this goal it is required to avoid solutions of linear sys- 
tems of equations per iteration or at least avoiding the exact solution of such systems. 
This goal is almost realized in the case of bound-constrained optimization problems 
and currently lot of methods are available to solve large scale box-constrained op- 
timizations. In this context we can recall the active set Newton algorithm of [13], 
affine-scaling interior-point Newton methods of [21] and new active set algorithm of 
[20]. One of the most important properties of these methods is relaxing the strict 
complementarity condition to the strong second order sufficient optimality condition 
to prove the local superlinear rate of convergence. 

It seems that, a key to develop an efficient method to solve large scale knapsack 
constrained optimization problems is to adapt large scale box-constrained solvers for 
such structures; while preserving their efficiency and convergence theories. There 
were a few works in which this clue was taken into account. In [11], by introduc- 
ing an efficient method to project a trial step on to the knapsack constraints, the 
projected Barzilai-Borwein method [10] was extended to solve knapsack constrained 
optimization problems. Modifying the search direction to keep iterations interior with 
respect to the related linear constraint, [16] extended the affine-scaling interior-point 
CBB method [18] to solve knapsack constrained optimization problems. However, 
the mentioned methods possess a linear local rate of local convergence at the best 
conditions. 

Roughly speaking, the goal of this paper is to introduce a general framework to 
extend almost every bound-constrained optimization method to solve knapsack con- 
strained problems without destroying its efficiency and convergence theories. For this 
purpose we shall specifically concentrate on new active set algorithm by [20]. The 
main reasons for this choice are its excellent convergence and efhciency in contrast 
to alternative methods. Moreover, we briefly discuses the extension of the Bigin- 
Marinez active-set algorithm (GENCAN solver) [4] to solve knapsack problems. This 
algorithm is very similar to that of Hager and Zhang. It is worth mentioning that 
extension of this algorithm to solve linearly constrained nonlinear programs is already 
performed in [1], which is called as GENLIN solver. However, it is not a computa- 
tionally feasible algorithm for large scale problems. We are hopeful that, following 
the presented procedure, it be possible to extend other alternative methods for such 
problems without important technical difficulties. 

The efficient and stable implementation of the projection onto the knapsack con- 
strains and null-space treatment of the linear constraint are main ingredients of the 
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presented framework for mentioned extension in this study. Our method to project a 
point onto the space of knapsack constrains is partly identical to that of [11, 22]. But, 
the technical details are different and some new results are also included. According 
to our knowledge, an efficient and stable implementation of the null-space method for 
knapsack constrains is introduced in this study for the first time. 

Notations. For any scalar u € M, 1;+ = max{u, 0} and v" = min{w, 0}. The me- 
dian operator acting on triple {u, v, w} is denoted by mid(M, w, w), i.e., inid(u, v, w) = 
max{w, min{ti, w}}. For an arbitrary vector v G M" and bound vectors 1, u e M" (1 ^ 
u), the operator mid(l, v, u) results a vector w e M" such that Wi — mid{li,v.i, Ui), i = 
1, . . . ,n. The gradient and Hessian of the objective function /(x) with respect to x 
are denoted by Vx/(x) and Vx/(x) respectively. For any x e S the active and in- 
active index-set (with respect to bound constraints) are denoted by A{x} and X(x) 
respectively, i.e. 

A{x.) — {i G [0, n] : Xi = k or Xi — Ui} 

X(x) = {i E [0,n] : li < Xi < Ui} 

The operator dim (yl(x)) returns the dimension of the active set with respect to bound 
constraints at x. Assuming dim (yl(x)) — t, the operator shrink(x) gives the row 
(column) vector x £ M" as the input argument and returns the reduced row (column) 
vector V S ]R"~* which includes only entries of x corresponding to the inactive indices. 
Similarly, the operator expand(v) gives the row (column) vector v G M"^* as the input 
argument and returns the expanded row (column) vector x G M" by adding the active 
indices to v accordingly. The subscript k (Dfc) is often used to denote the quantity of 
interest at the fc-th iteration. When there is no confusion, we may use vj, to denote 
v(xfc). 

2. Projection onto the knapsack constraints. As it was mentioned in sec- 
tion 1, the projection of a trial point onto the feasible set (V) is one of the main 
ingredient of the presented method in this study. This issue is studied in this section. 
Consider the trial point y e M", the projection of y onto the feasible set; which is 
denoted by z e M" here; is equal to solution of the following constrained least square 
problem 

1 2 

z :=7'-D(y) = argmin -||x-y||2 (2.1) 

xeij z 

'Pviy) in (2.1) denotes the projection operator which projects a trial point y onto 
the feasible set V with respect to the euclidean norm. In the first part of this section 
we shall consider projection onto Vg. Later we extend our method to solve problem 
of projection onto I?x- Solution of (2.1) with V = Vs is equivalent to solving the 
following quadratic programming (QP) problem 

min -X Ix — y X s.t. : a x = 6, 1 ^ x ^ u, (2-2) 

where I e K"^" denotes the identity matrix. In general solution of a QP problem is 
expensive, however, exploiting specific structure of (2.2) results an efficient computa- 
tional method which is discussed here. 

Proposition 2.1. The feasible set 2?£ is non-empty if the following conditions 
hold 



E 



i=l 



(ujOj -I- ka^j^) s^ & < ^(Mja+ -I- ka^ ) 



Proof. The proof is trivial considering the geometry of Vg (also see equation 2.6 
in [11]). D 

Theorem 2.2. Assume that the feasible domain Vg is non-empty. Then problem 
(2.2) has a unique solution x* which is computed by 

X* == mid(l, y - A*a, u) (2.3) 

where X* G M" is the Lagrange multiplier corresponding to linear equality constraint 
a'^x = b at the unique KKT point of (2.2), and A* is equal to the unique root of the 
following equation 



h{X) = b-'^[ai mid(Zi, y^ - Aa^, Ui) ] . 



Proof. The existence and uniqueness of solution and Lagrange multiplier is fol- 
lowed directly by the strict convexity of the problem and non-emptiness assumption of 
the feasible domain. Now let us to reformulate (2.2) as a box-constrained optimization 
problem by augmenting the linear equality constraint to the objective function 

/:[x;A] = - x^I x-y^x + A(a^x-5) s.t. : l<x<u (2.4) 

where A € M is the lagrange multiplier corresponding to the linear equality constraint. 
It is obvious that the unique stationary point of Lagrangian £[x; A] constrained by 
box B which is shown by (x*, A*) is equal to the solution of (2.2). 

For a fixed value of A, the box constrained stationary point of (2.4), x*(A), can be 
computed by the projected gradient method (cf. [24]). Using optimality conditions 
based on the projected gradient method, x* (A) is equal to the unique zero of projected 
gradient (with respect to x) of Lagrangian £, i.e., 

7'e(x*(A) - Vx/:[x*(A); A]) - x*(A) = (2.5) 

where Vb{) denotes the projection operator onto B. Simplification of (2.5) results 

7'e(y-Aa)-x*(A) = (2.6) 

since for an arbitrary vector v G M", 7'e(v) is equal to mid(l, v, u), the solution of 
equation (2.6) can be written in the following explicit form 

x*(A) ^mid(l, y- Aa, u) (2.7) 

considering the necessary optimality conditions of (2.2), A* is equal to unique zero of 
the following non-smooth equation 



h{X) = b-^a,x*{\) {2.. 



which completes the proof. D 

Therefore, finding the unique solution of single parameter non-smooth equation 
h{X) = is the main step to solve (2.2). Function h{X) is a piecewise linear function 
with at most 2n breakpoints. The corresponding A of these breakpoints can be shown 
by set 

Tx^{\l An« = l,...,n; a. ^0} 
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where 

>^\ ^ [Ui ~ h) / ai, \i ^ {yi-Ui)/ai, i^l,...,n, 

it is evident that A" ^ A' for a^ > and A' ^ A" for ai < 0. Considering A coordinates 
of breakpoints, for ai > 0, Xi{X) can be expressed in the fohowing form, 



x^iX)={ y^-\a^, if \f^\^\\, (2.9) 



if 


A< A^, 


if 


Ar < A ^ A^ 


if 


A^A^ 


if 


A> A^ 


if 


A' < A s^ A^ 


if 


A < Af-. 



in the same way for ai < 0, we have 



x^{X)^{ y^-\a^, if A^^As^A^, (2.10) 

note that for ai = 0, Xi is independent from A and is equal to inid(Zi, yi, Ui). In 
this case, it is possible to consider a reduced counterpart of the original problem. 
Therefore, without loss of generality, we can consider ai 7^ 0. Considering (2.9) 
and (2.9), Xi{X) is a continuous piecewise linear and monotonically nonincreasing 
function of A for a^ > 0; and in the same way; Xj(A) is a continuous piecewise linear 
and monotonically nondecreasing function of A for a^ < 0. Therefore, according to 
(2.7) and (2.8), h{X) is a continuous piecewise linear and monotonically nonincreasing 
function of A. It is obvious that finding the root of (2.8) is the most expensive part 
of projection in this section. 

In [22] a median based breakpoint searching algorithm was introduced to find 
the unique root of (2.8). This algorithm is somehow bisecting the set of breakpoints 
(consider the traditional interval bisection method). Although, by using 0(n) median 
finding method, the computational complexity of this algorithm is 0(n), the propor- 
tionality constant and the amount of operations per step is relatively large. In [11] 
a two-phase algorithm was suggested to find the root of (2.8). In the first stage, the 
bracketing phase, an interval [Al,A/j] where /i(Ai) • ft.(Ajj) < is found. Then in 
the second phase, secant step, the root of (2.8) is computed by an accelerated secant 
algorithm. In the present study, we use a combination of the bisection and quadratic 
interpolation methods to find the root of (2.8). 

Proposition 2.3. Assume Xl — min{ A', A" | i = 1, . . . ,n; a^ 7^ 0} and 
Xfi — max{ A', A" | i = 1, . . . , n; Oj 7^ 0} then /i(Ai) •/i(Afl) < or h{XL)-h{Xji) — 0, 
i.e., the root of (2.8) happens within the interval (Ai,A/j) or it is equal to either of 
Xl and Xr. Moreover, h{X) ^ for X ^ Xl and h{X) ^ for X ^ Ai,;. 

Proof. Since h{X) is a continuous piecewise linear and monotonically nonincreasing 
function of A and it has a unique root, we should have h{X) ^ for A ^ A^ and 
h{X) ^ for A ^ A_R. So the root of h{X) happens within the interval [Al, A_r]. D 

Considering the Proposition 2.3, the bracketing phase used in [11] is not required 
as it is a-priori known. Therefore, it is possible to find the root of h{X) by the 
traditional interval bisection method. 

Assume that the error at the fc-th step of the bisection algorithm is shown by 
Efc, where eo = (A_r — Ai,)/2. Then Ck+i = efc/2. Therefore the number of bisection 
steps to find A* within tolerance e is equal to log2(eo/e). Considering the limited 
arithmetic precision of computers, the upper bound on number of bisection steps is 
priori known and is independent from n. Notice that the number of set bisection 



in [22] is function of n. Therefore, for large n, the worst case complexity of the 
interval bisection method should be better than the worst case complexity of the set 
bisection algorithm of [22]. Moreover, when the interval became sufficiently small (in 
the interval bisection algorithm), it is possible to do a linear interpolation between 
three consecutive available points to find a trial root. Then discarding the algorithm 
if the trial root is zero of h{X). 

In contrast to alternative methods, the bisection method is the most robust algo- 
rithm and is enable to find the root within the machine precision in a finite number 
of steps. However, its linear rate of convergence is cumbersome. In the present study 
we use the algorithm introduced in [6] to find zero of h{X). This algorithm is a com- 
bination of the bisection and the inverse quadratic interpolation methods. By this 
strategy, Brent algorithm benefits from the good convergence-rate of quadratic inter- 
polation and the robustness of the bisection method. Prior to describing this method, 
it is worth to state the following remark to decrease the computational complexity 
(hopefully with a log2 slope) in the course of iterations. Assume that the current 
search interval is shown by [A; , A^] . 

Remark 2.4. Considering (2.9) and (2.10) together with the current value of 
[A; , Ar] , it is possible to reduce the computational cost by freezing the components of 
X vector for which Xi is determined. More clearly, when Oi > 0, x*{X) can be fixed to 
Ui or li if Xr ^ A" or A; ^ A' respectively; and when Oi < 0, x* (A) can be fixed to Ui 
or li if A; ^ A" or Aj. ^ A' respectively. 

Consider tu S I^"*^ as the machine precision, we are looking for the unique root 
of h{X) within precision e e M+ (e > ej\/). The machine precision can be computed 
by algorithm 665 of ACM TOMS [8] . In the root finding algorithm in this study we 
have three points Aq, Af, and Ac such that h{Xb) ■ h{Xc) < 0, |/i(Ab)| < |/i(Ac)|, and 
Xa may coincide with Ac. Initially Ab = Al, Ac = Xji and Xa = Ac. The point Af, is 
considered as the best approximation to A* in the course of iterations. 

Consider A> = (Ac — Ab)/2. If A> ^ e the value of A;, is returned as an ap- 
proximation to A*, else an inverse quadratic interpolation is used to compute a trial 
approximation for the root of function h. For the convenience, h{Xx) is shown by h^ 
henceforth. Assume we have three distinct points (Aa,/ia), (A{,,/ib) and (Ac,/ic), then 
the quadratic lagrange interpolation formulae can be written as follows 

, _ {hx - hb){hx - he) {hx - hc){hx - hg) (hx - ha){hx - h) 

(ha - hb){ha - he) " (hb - hc)ihb - ha) (he- ha) {he- hb) 

The root of this quadratic approximation. At , can be written in the following explicit 
form 

Xt^Xb+p/q (2.11) 

where r = hb/hc, s = hb/ha, t ~ ha/hc, q = ±(i — l)(r — l)(s — 1) and p = 
zLst{r ~ t){Xc — Xb) — s(l — r)(Xb — Xa). At the start of iterations in which there are 
only two distinct points, a linear interpolation is used to find the trial root. When 
g « the overflow problem may cause to the failure of computation. Therefore, the 
trial root At will be rejected in this case. Moreover, when the interpolating parabola 
has two roots between Xb and Ac or when its root is located outside of this interval, the 
interpolation is poor and the trial root At should be rejected. In the case of inefficient 
step {\p\ ^ e\q\) or when the current p/q is greater than half of its previous value, the 
trial root At will be rejected to avoid the slow convergence. Anyway the trail root 
which is resulted from the inverse interpolation be rejected, a bisection step will be 



performed. In summary, if either of the following conditions is met the trial root of 
the inverse quadratic interpolation is rejected and a bisection step is used instead 

\p\ ^ 2/3 I^AaI, \p\ ^ e\ql \p/q\ ^ 1/2 \vlq\oM 

The stopping criteria is a critical issue to avoid excess computations when round- 
ing errors prevent further progress toward the exact solution in the vicinity of root. 
Following [30], the following relation is used as the stopping tolerance in this study 

to/==2eM|A6|+e/2 

Note that the above criterion docs not tell us how close we are to the root, but only 
that we are in some interval about the zero where roundoff error may be dominating 
our calculations. 

According to our numerical experiments, this method finds the desired root within 
the machine precision by a fewer function evaluations in contrast to mentioned alter- 
natives (in our experience the number of function evaluation calls were usually below 
12 for double-precision arithmetic and e = l.e — 15). 

Remark 2.5. Similar to new line-search algorithm introduced in [19], the men- 
tioned root finding method robustly tolerates the limited machine precision; simulta- 
neously remains efficient as much as possible. Therefore, it has a good potential to he 
adapted as an alternative robust line-search method. In particular, in contrast to that 
of [19] which uses inverse linear interpolations (secant steps), it uses inverse quadratic 
interpolations without additional function evaluation. 

Now, consider the solution of (2.1) with V = Vx which is equivalent to the 
following QP problem 

min - x^I x - y^x s.t. : bi ^ a^x s^ 6„, 1 ^ x ^ u, (2.12) 

Proposition 2.6. The feasible set T>x is non-empty if the following conditions 
hold 

n n 

'^{uia~ -\- kaf) i^k i^bu^ '^{uiuf -I- ka~) 

i=l i=l 

Proof. The proof is directly followed from proposition 2.6. D 

The following theorem provides an elegant method to compute the solution of 

(2.12) using results of theorem 2.2. 

Theorem 2.7. Assume that the feasible set T>x is non-empty. Then problem 

(2.12) has a unique solution x* which is computed by the following relation 

x*=mid(x^, y, x^) (2.13) 

where x^,x^ G M" are unique solutions of the following QP problems 

XL := min - x"* I x — y"* x s.t. : a^ x = bi, 1 ^ x ^ u, (2-14) 
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X[/ := min - x I x — y x s.t. : a x = fe„, 1 ^ x ^ u, (2-15) 



Proof. The existence and uniqueness of solutions for problems (2.12), (2.14) and 
(2.15) are directly followed by the strict convexity of related problems and the non- 
emptiness assumption of 2?x- 

Same as previous, we reformulate (2.12) as a bound constrained optimization 
problem by augmenting the linear inequality constraints to the objective function 

£[x;/x/;/x„] = - x^I x-y'^x + ^;(a^x-6() +^„(a^x-6„) s.t. : 1 sC x < u 

where ni, Hu G K are the lagrange multipliers corresponding to the inequality con- 
straints a-^x ^ bi and a^x ^ &„ respectively. It is evident that the unique stationary 
point of Lagrangian £[x; /i/; /i„] constrained by box B which is shown by (x*, //,*, /i* ) 
is equal to the unique solution of (2.12). 

For fixed values of ni and /i„, the bound constrained stationary point of La- 
grangian £, i.e., x*{fj,i, Hu), can be computed by the projected gradient method. 
Using optimality conditions based on the projected gradient method, x*{fii,^u) is 
equal to the unique zero of the projected gradient (with respect to x) of Lagrangian 
C. Using the same way like (2.5) results 

x*(/ii,/i„) =mid(l, y- (/i; +/i„)a, u) (2.16) 

assuming /i = /i; + /!„ simplifies (2.16) to the following form 

x*(M)=mid(l, y-Ma,u) (2.17) 

note that at least either of /i; or /i„ is zero at the optimal solution, so, as (2.17) shows, 
knowing ^i + fi^ at the optimal solution is sufficient to uniquely determine /ij* and 
/x*. Considering the necessary optimality conditions of (2.12), the remaining job is 
to compute the optimal value of fj, (which is shown by /x*) such that the following 
inequalities are satisfied 

&/s^a^x*(Ai)<6„ (2.18) 

or 

bi < a^mid(l, y - /la, u) s^ 6„ (2.19) 

due to the uniqueness of lagrange multiplier /i* at the optimal solution, the inequality 
equation (2.19) has only one solution. Without confusion with the previous definition 
for /i(A), assume the following new definition for function h{ii) 

/i(/i)-a^x*(/i) 

Similar to the previous arguments on function /i(A), it is easy to show that /i(/i) is 
a continuous piecewise linear and monotonically nonincreasing function of /i with at 
most 2n breakpoints (to save space we avoid repetition of similar discussions for /i(/x)). 
Assume A;* and A* are optimal lagrange multipliers corresponding to linear equal- 
ity constraint in problems (2.14) and (2.15) respectively. Considering the mentioned 
monotonicity of functions h{X) and ft.(/i), the following inequalities are identical 

a; ^ // ^ a; (2.20) 

therefore an appropriate search interval for fi* is the bracket [A*,A^]. Considering 
(2.3), (2.19) and (2.20), the following inequalities hold 

xK X* s^ x^ (2.21) 



notice that the inequahties are understood componentwise here. Refereing again to 
the mentioned monotonicity of functions h{X) and h(ji), when x^ ^ x* the inequaUty 
a'^x* ^ bi always holds. In the same way, the inequality a-^x* ^ b^ will be remained 
always satisfied when x* ^ x^. Therefore, replacing bound constraints 1 and u re- 
spectively with x*j^ and x^ reduces problem (2.12) to the following bound constrained 
QP problem 

min - x^I X - y^x s.t. : x^ < x s^ x^ (2.22) 
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with the explicit solution 



X* =mid(x^, y, x^) 



which completes the proof. D 

Therefore, it is possible to solve (2.12) in expense of solving two problems with 
structures similar to (2.2). However, the asymptotic computational cost is not es- 
sentially duplicated in this way. This is because of some shared calculations like 
computing A^, Xfi, /i(Al) and h(Xii). Moreover, the trajectory points produced dur- 
ing the solution of (2.15) can be easily used to reduce the initial search bracket to 
solve (2.14). For instance, if pair {Xx,hx) was produced during the solution of (2.15), 
the corresponding h to A^; when solving (2.14) is equal to h^ + bi ~ bu- 

3. Null-space management of linear constraints. In this section we present 
an 0(n) method (in terms of computational cost and consumed memory) to treat the 
linear constraint in knapsack problem. For the convenience, the basic idea of the 
null-space methods is recalled at the first part of this section (for more details see 
chapter 5 of [14]). 

Let A e ]U"ix" be a full rank matrix. The null-space of A is denoted by 

AA(A) = {p e M" : Ap = 0} 

which is the set of vectors orthogonal to the rows of A. The null-space of A is 
a subspace of M" with dimension n — m (consider the full-rank assumption of A). 
Therefore, any linear combination of two vectors in N{A) is also in M{A). Any 
matrix Z g ]gnx(n-Tn) .^^jjQgg columns form a basis for M{A) can be considered as a 
null-space matrix for A. It is easy to show that Z satisfies A-^Z = 0. 

The range-space of A is defined as a subspace of E" with dimension m which is 
spanned by the columns of the A (the set of all linear combinations of A columns). 
In particular, we are interested in the range space of A-^, defined by 

7^(A^) = {q e M" : q = Ar for some r G M™} 

It is easy to show that N{A.) and TZ{A^) are orthogonal subspaces. Therefore, it is 
possible to uniquely decompose an arbitrary vector x S M" into sum of the range- 
space, q £ TZ{A^), and null-space, p G A/'(A), components 

x = q + p = q + Zv 

where v e M^"^™). Consider linear system of equations A-^x = b (b e M'"). As- 
suming xq G M" as a particular solution for A"^x — b, any other solution x can be 
parameterized as 
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x = xo + Zv (3.1) 

In fact Zv acts as feasible directions for matrix A (note that AZv = 0). 
Now consider the foUowing optimization problem 

min fix) s.t. : A^x = b (LEP) 

xeR" 

using (3.1), it is possible to convert the linearly constrained optimization problem 
(LEP) on M" to the following unconstrained optimization problem on the reduced 

space M("""\ 

min /(xo + Zv) (RLEP) 

veAA(A) 

the gradient and Hessian of / with respect to the reduced vector v at the trial point 
Vq can be easily computed using the chain rule, i.e., 

Vv/(xo + Zvo) =Z^Vx/(xo + Zvo) 
V2/(xo + Zvo) =Z^V2/(xo + Zvo)Z 

the necessary and sufficient conditions for the reduced problem (RLEP) is same as 
the classical unconstrained optimization problems in which the gradient vector and 
the Hessian matrix are replaced by the reduced counterparts. Therefore, any uncon- 
strained optimization solver can be employed to solve (RLEP) without any technical 
difficulty. 

Now, consider the following inequality constrained optimization problem 

min fix) s.t. : A^x > b (LIP) 

Using an appropriate active set strategy, it is possible to solve (LIP) with the null- 
space method. Assume the set of active constraints at the local solution is known. A, 
then the corresponding unconstrained optimization problem is solved on a null-space 
spanned by only active constraints, A/'(A). In this case the caution should be taken for 
inactive constraints. Assume the index set of inactive constraints at x is shown by I 
(row- wise index). Denoting the search direction at x by p = Zv, for all index i G I so 
that a^p > 0, any positive move along p will not violate the corresponding constraint 
(ai denotes the i-th row of A). Therefore, constraints with non-negative af p do not 
pose any restriction on the stepsize. However, there is a critical step length, 7^, for 
indices with af p < 0, where the constraint becomes binding, i.e., af (x -t- 7ip) = b^. 
So, the upper bound on the stepsize due to feasibility of iterates is computed by the 
following relation 

a = min { + 00, 7, = (b, - af x)/(af p) | af p < O} (3.2) 

Similarly, the constrained optimization problem (LIP) can be converted to the follow- 
ing unconstrained optimization problem on the reduced space spanned by the active 
constraints 

min /(xo + aZv), ae[0,a] (RLIP) 
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By some simple linear algebra, it is easy to show that 

cond(Z'^V^/(x)Z) < cond(V^/(x)) coiid(Z)^ 

therefore, to cope the possible instability during the computations (consider the lim- 
ited precision arithmetic of computers) , it is preferable to use an orthogonal null-space 
basis; cond(Z) = 1. The QR factorization [15] is a common way to compute an or- 
thogonal null-space for a desired full-rank matrix. 

3.1. QR factorization. The QR factorization for the transpose of full-rank 
matrix A S K™^" is expressed in the following form 

A^ = QR - ( Qi Q2 ) (" ^' 

where Q is an orthogonal matrix, Qi e M"^", Q2 E M"x("-™), R^ e ^mxm^ q^^^^ 
Q is orthogonal, it follows that AQ = R-'", or AQi = Ri and AQ2 = which 
results Z = Q2. 

In the present study, the Householder QR algorithm (cf. [15]) is used to compute 
the QR factorization of A. In this method the orthogonal matrix Q is represented by 
the product of Householder reflection matrixes 

Q = H1H2 Hm 

where every Hi e M"^" is an orthogonal matrix with the following generic form 

H = I ruu^ 

where I e M"^" is the identity matrix, r e M and u e M". For details of computing 
Hi factors refer to [15]. In the following we adapt the Householder QR factoriza- 
tion algorithm to compute an orthogonal null-space for linear constraint in knapsack 
problems. 

In the knapsack problems, we have at most one active linear constraint, so we 
need to compute the QR factorization for A = a. Without loss of generality, assume 
that ai y^ (else a simple pivoting should be applied). In this case the factor Q can 
be represented in the following form 

Q = I - TUU^ 

where 

1 "^^ ■ O (-0.1 ■ t \ w w 

ui = 1, Ui^ -, i^2,...,n, T= — - — , C = -sign(ai) a 2 

fli -C C 

moreover, the matrix R = Ri is a 1 x 1 matrix such that its singleton entry is equal 
to C- 

In the null-space method we frequently need to do product of null-space matrix, 
Z, or its inverse, Z^ (consider orthogonality of Z), with a desired vector. Notice that 
matrix Z is formed by removing the first column of matrix Q. Exploiting the specific 
representation of Q it is possible to perform the product of Z with an arbitrary vector 
V G M"^^ in 0(n) arithmetic operations. Considering a pseudo entry uq = for vector 
V, simple linear algebra results 

n-l 

zv — Zv, zvi = Vi-1 — TUi 2^{uj+iVj), i — 1, . . . ,n. 

11 



where zv G M" is expansion of the reduced vector v G M"^^ to the full-space. 

In the similar way, product of Z"^ with an arbitrary vector w G M" can be 
computed by the following relation 

n 

ztw = Z w, ztwi — Wi+i — TUi+i y^iuiWi), i = 1, . . . ,n — 1. 

i=i 

3.2. An alternative null-space by orthogonal projection matrix. The 

orthogonal projection matrix is another possible choice for the null-space of full rank 
matrix A G M™^". The orthogonal projection matrix, P G M™'^", can be computed 
by the following relation 

P = I- A^(AA^)-iA 

The main difference of this null-space basis with the previous mentioned one is that the 
orthogonal projection matrix is not full rank. Therefore, the equivalent unconstrained 
optimization problem will be solved on the full-space M". This may simplifies the 
implementation of method; but probably; in expense of more computational cost. 

Notice that the name "orthogonal projection matrix" should not make misleading 
about the properties of this null-space basis, as in general P is not an orthogonal 
matrix. Therefore, the stability of computation can be in question in using P as the 
null-space matrix. 

Without regard to the stability of computation, the special structure of A in the 
knapsack problems, makes the orthogonal projection matrix an attractive choice to 
make a null-space basis. In this case P can be computed and stored explicitly; without 
any cost; as follows 

P = I (a^a)"^ aa^ 

The product of P with an arbitrary vector v G M" can be computed within 0(n) 
operations; as follows 

pv = Pv, pv.^^Vi-a,^^ — -, i = l,...,n 

where pv G M" is the projection of vector v onto the null-space of A. Using theorem 
2.2, it is trivial to show that the orthogonal projection of an arbitrary point y G M" 
onto equality constraint of knapsack problem (ignoring bound constraints) which is 
denoted by z G M" can be computed by 



a,. 



En 2 ' 



i — 1, . . . ,n (3-3) 



In the similar way, using theorem 2.7, projection point onto knapsack bilateral inequal- 
ity constraint (ignoring bound constraints) is computed by the following relation 

z — mid(zl, y, zu) (3.4) 

where zl and zu are computed by (3.3) in which 6 is replaced by 6/ and by^ respectively. 
Equations (3.3) and (3.4) are useful for treatment of linear constraints by projection 
in the unconstrained solver. 
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3.3. Unconstrained optimization on the reduced space. In this subsec- 
tion we want to comment on the exploiting an unconstrained optimization solver on 
the null-space of (active) linear constraints. Basically this procedure is straightfor- 
ward and the unconstrained optimization solver can be blind about the constrained 
problem, i.e., we can use the unconstrained optimization solver as a black-box. The 
following items are sufficient to exploit the unconstrained solver for this purpose: 

(a) At the initial step we need the starting point Xq G M" which is feasible with 
respect to linear constraints. It can be achieved by projecting a trail point 
onto the constraint set. This point is stored in memory and used during 
the unconstrained optimization as described below. The starting point of 
unconstrained optimization solver, vg G M"^'", can be taken equal to € 

(b) When the unconstrained solver asks for the value of the objective function at 
the trial point v/j e M"^'", we should first compute the corresponding value 
of v/c on the full-space, x^ € M", by this relation x^ = Xq -f Zv^. Then /(x^) 
is passed to the unconstrained solver as the value of the objective function at 

Vfe. 

(c) Similar to item (b), when the unconstrained solver asks for the gradient of 
the objective function, Vv/(vfc), at the trial point v^, we ffi'st compute x^ = 
Xq + Zvfe. Then pass Z^Vx/(xfc) to the unconstrained solver as the desired 
gradient. 

(c) Similarly, when the unconstrained solver asks for the Hessian matrix of the 
objective function, Vj/(vfe), at the trial point Vfc, the value of Z^Vx/(xo -I- 
Zvfc)Z is passed to it as the desired Hessian matrix. 

(d) The stepsize related to the globalization strategy of the unconstrained solver 
should be restricted to interval [0,a], where a is computed by (3.2). In 
the present study, this item only applied in the case of inequality knapsack 
problems. If the active set of linear constraints at the optimal solution be 
determined in a finite number of iterations, this stepsize restriction does not 
destroy the local convergence rate of the unconstrained solver. For the finite- 
cycle determination of active set, it is required that the new active set be a 
sub-set of the previous one; which is possible to ensure in the case of no non- 
degeneracy of the optimal solution with respect to the linear active constraints 
(the lagrange multiplier corresponding to the active constraint be non-zero 
at the local solution). 

4. Hager-Zhang active set algorithm for knapsack problems. Consider a 
bound constrained nonlinear program, if the set of active constraints at a local solution 
be a-priori known, it is possible to fix these constraints to the corresponding bound 
values and solve an unconstrained optimization problem on the reduced space spanned 
by the free variables. This is the key idea of the active set strategy. In practice, the 
set of active indices in not a-priori known, therefore they should be identified by 
an appropriate prediction correction strategy. Under appropriate conditions, this 
procedure can be performed in a finite number of iterations. However, in general it 
is possible to add (remove) only one index to (from) the current active set. This 
increases the necessary number of iterations in particular for large scale problems. 
Fortunately, it is possible to add many constraints to working set by means of the 
projected gradient method. This idea was used in [24, 21, 4, 20] to efficiently exploit 
the active set strategy. 

In this section we adapt the Hager-Zhang active set algorithm (HZ-ASA) [20] to 
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solve knapsack problems. Using the previously constructed tools, this extension is 
trivial and the convergence theory holds almost under the same conditions. The main 
reason for selection of this algorithm is its excellent convergence theories in addition 
to the promising numerical results reported in [20]. Moreover, unlike [24, 21], this 
method admits the superlinear convergence under the same conditions while it does 
not need any explict form of second order information and/or solution of system of 
linear equations per cycle. These properties makes it an ideal choice to solve very large 
scale problems. The current author believes that the key efheiency of this method can 
be connected to its two-phase nature. Unlike the related methods which use the same 
interactions in the course of optimization, this method start with a cheap constrained 
first-order method and after sufficient progress toward a local solution, branches to 
a (more expensive) higher-order unconstrained solver. In other words, it does not 
waste the energy at early stages by performing expensive and accurate steps. Under 
certain conditions, the method only branches once between two phases. This strategy 
is particularly effective when the initial guess is poor. Note that the basic idea for 
this strategy is not new and already used in [4]. 

In [20] the nonmonotone spectral projected gradient [5] (SPG) was used to identify 
the working set. In this method, the search direction is parallel to the steepest 
descent direction (is descent) and the stepsize is computed by projecting the (spectral) 
Barzilai-Borwein [2] step onto the feasible set. Moreover, the Grippo-Lampariello- 
Lucidi nonmonotone line search algorithm [17] is used to ensure the convergence to 
a local minimum from an arbitrary initial guess, simultaneously benefiting from the 
spectral property of the Barzilai-Borwein stepsize as much as possible. The SPG 
method can be efficiently applied to every convex constrained optimization problem 
(under some common conditions) providing an efficient method to project a trial point 
onto the feasible set. Therefore, by means of 0(n) projection algorithm introduced 
in section 2, SPG can be effectively used to solve large scale knapsack problems. 
However, the local convergence of SPG method will not be better than linear in the 
vicinity of the local solution. 

Let a G K, the scaled projected gradient, d"(x) is defined as follows 

d"(x)=Pp(x-aV,,/(x)^)-x 

It is easy to show that d"(x) is a constrained descent direction (e.g. see: lemma 2.1 
of [5] or proposition 2.1 of [19]). Therefore, it can be used as the stopping criteria 
to measure the distance of current iterate from the optimal solution. Assume S/; = 
(xfc — x/;_i) and yfc = (Vx/(xfc) — Vx/(xfe_i)). In the Barzilai-Borwein method, the 
trial stepsize, ak, is computed from a one-dimensional search method based on the 
following relation 

ttfc = argmin - l|D(a) Sfc_i - yfc_il|^ = {sl_j^Sk-i)/{sl_^yk-i). (4.1) 

where D(a) = -I is a diagonal approximation to the Hessian matrix Vx/(x). The 
spectral property of Barzilai-Borwein method can be inferred from the following 
lemma: 

Lemma 4.1. The Barzilai-Borwein stepsize, is equal to inverse of Rayleigh quo- 
tient, related to vector Sk-i, of the averaged Hessian of the objective functional between 
two consecutive iterations k — 1 and k. 

Proof. Using the Mean- Value Theorem it is easy to show that: 

(Vx/(xfc)-Vx/(xfc_l))/(xfc-Xfc_l)= / V2/(tXfc + [l-i]xfc_l)d^, 

"'0 
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therefore, 

(sLiy/c-i)/(sLiSfc-i) = (^fe-i( / VV(xfc-i+iSfc_i)dt)sfc_ij/(sJ_iSfc_i), 

which complete the proof. DBy lemma 4.1, A^]j^ ^ ctk ^ ^mim where Kmax and 
^min are the maximum and minimum eigenvalues of the averaged Hessian matrix re- 
spectively. Therefore, Offcl is a consistent approximation to the inverse of the Hessian 
matrix. The statement of SPG algorithm is as follows: 

Algorithm 1: Outline of spectr^aJ projected gradient (SPG) algorithm 

1 given € e [0, oo), < a-^ij, < Umax < 00 ; 

3 take fc = and Xo £ P ; 

■.i while IkU.II > e do 

4 if [k = or sj, yk ^ 0) then c" = 1; 

T 

5 else (T = mid[((j„,i„, ^f^, o"maz); 

6 take d^ as the search direction; 

7 compute stepsize af. by noiinionotone line search; 

8 x^+i = xt + akdl, k = k+\ ; 
end 



Note that the condition s^yfe ^ in line 4 of algorithm 1 means the detection of neg- 
ative curvature directions. Assuming that the objective function is consistent with its 
quadratic approximation around the current iterate, the large stepsize tr = 1 is used 
to achieve the maximum reduction. The statement of the nonmonotone line search 
used in this study is as follows: 





Algoritlini 2: Outline of nonmonotone line search algorithm 


1 given M e N (Af > 0), 7 e (0,1), < qw« < a„„^ < 1, ; 


2 take a = 1,6= (d^fV^fk, x+ = xt + df. ; 

3 /^— = niax{/fc_j 1 ^ j ^ min{^%M - 1}} ; 


4 w 
& 

G 

7 


diile (/(x+)>/P" + a75)do 
at = ia25/{/(x+)-/(xi.)-a5); 
if («mm ^ Ot < aa^^ax) then a = a^; 
else a = q/2; 


S 

9 e 


x+ = Xfc + ad% ; 
nd 





Notice that when the decrease condition in line 4 of algorithm 2 is not satisfied a 
quadratic interpolation (line 5) is used to compute the trail step at and it is accepted 
if lies within interval [amin, otOLmax]^ otherwise a bisection will be performed (line 7). 
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The quadratic function q{a) is formed such that q{0) = /(x^), q{a) — /(x+) and 
dq/da = S. 

Theorem 4.2. ([5] theorem 2.4) Algorithm SPG is well-defined, and any ac- 
cumulation point of the sequence {xfc} that it generates is a constrained stationary 
point. 

Further details about SPG algorithm and its convergence theory can be found in 
either of [5] or section 2 of [20]. 

The unconstrained solver (US) used in HZ-ASA is the CGDESCENT algorithm 
[19] which possesses a superlinear local convergence rate under some common con- 
ditions. In HZ-ASA the unconstrained solver is modified to be enable to manage 
infeasible iterations (with respect to bound constraints). Similar to (3.2), it includes 
the modification of the stepsize such that the new iterates will not be infeasible; i.e., 
the trial stepsize is limited to the interval [0, a] where, 

a = max{a £ M+ | (x/(x) + ap/(x)) e B} (4.2) 

The outline of the reduced CGDESCENT (RCGD) algorithm used in this study is as 
follows: 





Algorithm 3: Outline of RCGD algorithm 


1 given € e [0, oo) , < q,„™ < Omo^ < '^c^Xq e H" , Xq e P, t = 


= dim (^o); 


2 take vo = 0, do = go = (Z^shriiik(Vx/(xo)))^, k = 0; 




3 wliile 1 g;. > e do 




4 


find a^: by monotone line searcli ; 




5 


compute Oik and d^ using (3.2) and (4.2) respectively; 




6 


ak = min{ak,ak, at,}, Qfc = mid(a„ii„,afc,a„„^); 




7 


Vt+l =Vk+&kdk: 




8 


&+1 = (Z^slirliik(Vx/(xo -hexpand(Zv^+i))))^ ; 




9 


yt = gfc+i - gfc; 




10 


/3. = 5-(y.-2d,!-f;); 




11 


dfc+i = dkdk - gfe+i, k = k-\-l ; 




12 end 







Notice that the value of active constraints will be fixed during RCGD iterations. The 
feasibility of free indices with respect to bound constraints will be maintained by ak- 

def 

When D = Dg then we have always one active linear constraint and work with d, 

g, V, y G M"^*^^ and a^. ~ oo. When V = Vj and both of the linear inequality 
constraints be inactive d, g, v, y e M"^* and Z = I. In this case the feasibility of 
iterations with respect to equality constraints will be controlled by ak- Finally, when 

def 

D — Dx and one of the linear inequality constraints be active we will work with d, 
g, V, y G M"^*^^ and the feasibility of iterations with respect to linear constraints 
will be controlled by a^. 

The monotone line-search used in RCGD algorithm should satisfy either of the 
standard Wolfe conditions or approximate Wolfe conditions introduced in [19]. In 
[19] a robust line search algorithm is suggested which tolerates the finite accuracy 
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precision of computer arithmetic very well. The global convergence and superlinearly 
local convergence of the presented RCGD algorithm can be directly followed from the 
convergence theories of the original CGDESCENT algorithm discussed in [19]. 
Following [20] , the set of undecided indices is defined as follows 

U{^)^{ie [l,n] |5,(x) ^ ||di(x)|P and mm{x, - h,u, - x,} ^ l|di(x)||^} 

where gi{x) denotes i-th component of Vx/(x), a £ (0, 1) and b S (1, 2) (e.g. a = 1/2 
and b — 3/2). In fact, U{x) denotes the set of indices correspond to components 
of X for which the associated gradient component, gi{x), is relatively large while Xi 
is not close to either of h or Ui. When U{x) is empty, it implies that the indices 
with large associated gradient component are almost identified and it is preferable to 
perform unconstrained optimization algorithm on the reduced space of free indices. 
This naturally happens in the vicinity of the local solution. 

Now let us to recall the HZ-ASA [20] here. Using the above mentioned SPG and 
RCGD algorithm it is straightforward to use HZ-ASA algorithm to solve knapsack 
problems. The outline of HZ-ASA for knapsack problems is as follows: 





Algorithm 4: Outline of HZ-ASA for knapyack proV)lcmM 


1 € e [o,.DC'), fi, p G (0, 1), ni, 712 e [1^ "■) ; 


Phase 1 (Active set identification by SPG): 


2 wliile di > e do 


3 


Do a SPG iteration and check the following:; 


4 


if ZYfxfc) =0 tlien 


5 




if |s]irink(V5c/(Xjt))| > /i]|d^ | then j.i = ppL\ 


G 




else goto Phase 2 


7 


else 


8 




if Ak = Ak-i = ■■■= Ak-n^ and [[shrink(Vx/(x;,))[[ ^ p\\dl\\ then 
goto Pliai5e 2 


9 


end 


10 end 


Pliase 2 (Unconstrained optimization by RCGD): 


11 wliile d^,| > e do 


12 


Do a RCGD iteration and check the following:; 


13 


if ||shrlnk(Vx/(x(,)|| > fi||d^|| then goto Pliase 1 and restart SPG ; 


14 


if \Ak-i\ <\Ak\ then 


15 




if W(xit) = or ^^ > pls.._i -1- nj then restart RCGD at x^; 


16 




else goto Phase 1 and restart SPG 


17 


end 


18 end 





The convergence theories stated in [20] for HZ-ASA is almost hold for algorithm 
4. In the remaining part of this section, we shall re-state results of [20] for algorithm 
4. 

Theorem 4.3. (global convergence) Let C be the level set defined by 
£-{xeP:/(x)</(xo)}. 
We assume the following conditions hold: 
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Gl. / is bounded from below in C and dmax — supj,||di:|| < cxd. 

G2. If C is the collection o/x e 2? whose distance to C is at most d,„ax7 then \7 f 
is Lipschitz continuous on C. 
Then either algorithm 4 with e = terminates in a finite number of iterations at a 
stationary point, or we have liminf \\d (a;fc)||oo = 0. 

Proof, followed from theorem 4.1 of [20]. D 

Theorem 4.4. (global minimizer) If f is strongly convex and twice continuously 
differentiate on T> , then the iterates x^ of algorithm 4 with e = converge to the 
global minimizer of (1-1) • 

Proof, see corollary 4.2 of [20] .D 

The following theorem results the local superlinear convergence rate of algorithm 
4 to a nondegenerate stationary point. 

Theorem 4.5. (nondegenerate local convergence) If f is continuously differen- 
tiable and the iterates Xj, generated by algorithm 4 with 6 = converge to a nondegen- 
erate (with respect to both of the bound and linear constraints) stationary point x* , 
then after a finite number of iterations, algorithm 4 performs only RCGD iterations 
without restarts. 

Proof, followed from theorem 5.1 of [20] .D 

The following theorems results the local superlinear convergence rate of algorithm 
4 to a stationary point (even degenerate stationary point) under strong second-order 
sufficient optimality condition. 

Theorem 4.6. (degenerate local convergence) Assume T> = T>£ in problem (1.1) 
and f is is twice- continuously differentiate. If the iterates Xfc generated by algorithm 
4 with e = converge to a stationary point x* satisfying the strong second- order 
sufficient optimality condition, then after a finite number of iterations, algorithm 4 
performs only RCGD iterations without restarts. 

Proof, followed from theorem 5.7 of [20] .D 

de f 

Theorem 4.7. (degenerate local convergence) Assume T> — T>x in problem 
(1.1) and f is is twice- continuously differentiable. If the iterates X/j generated by 
algorithm 4 with e = converge to a stationary point x* satisfying the strong second- 
order sufficient optimality condition and x* is not degenerate with respect to the linear 
inequality constraint, i.e., the optimal value of lagrange multiplier corresponding to the 
active linear constraint is nonzero, then after a finite number of iterations, algorithm 
4 performs only RCGD iterations without restarts. 

Proof, followed from theorem 5.7 of [20] .D 

Remark 4.8. There is a simple strategy to virtually benefit from the local super- 
linear convergence rate of algorithm 4 in the case of degeneracy of the minimizer with 
respect to linear inequality constraints. Since the different states of the active set of 
(1.1) with respect to linear constraints includes only three situtions, it is possible to 
solve (1.1) sequentially three times with algorithm 4 using its strong local convergence. 
Assume, x^, x^* andx.^^ are solutions of (1.1) by algorithm 4 without considering the 
linear constraint in RCGD, considering the linear equality constraint with b = bi in 
RCGD and considering the linear equality constraint with b — bu in RCGD respec- 
tively. Note that we can first solve for x*^ to possibly avoid further computation. If 
x*j satisfies the bilateral inequality constraint, it is equal to a local solution of problem 
and the computation will be discarded (of course in this case there is no degeneracy 
with respect to linear constraints). Otherwise, the local solution of problem is one of 
X;* or x* which correspond to the smaller objective function value. Therefore, this 
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strategy finds a local solution of problem by solving three sequential superlinearly con- 
vergent optimizations. However, as the cost of computation is tripled by this strategy, 
the overall rate of convergence can be sub-linear in the worst conditions. 

5. Birgin-Martinez active set algorithm for knapsack problems. In [4] 

an active set algorithm is introduced to solve large-scale bound constrained optimiza- 
tion problems. The general outline of this algorithm is very similar to that of Hager 
and Zhang; and differences are mainly related to some technical issues. It's worth 
mentioning that this algorithm does not posses a superlinear local rate of convergence 
like Hager-Zhang algorithm, and in the best conditions the local rate of convergence 
will not be better than linear. However, our numerical experiences showed that the 
performance of this algorithm is very promising solving large-scale real world prob- 
lems (e.g. when using accurate functional value and its gradient is not economic or 
possible) . 

Like Hager-Zhang active set algorithm, Birgin-Martinez active set algorithm is a 
two-phase algorithm: a gradient projection step to explore the working set and an 
unconstrained optimization solver on the space spanned by free variables. In [1] this 
algorithm was extended to solve linearly constrained optimization problems. Due to 
its expensive projection and null-space steps, this algorithm is not feasible for large- 
scale problems. Therefore, our algorithm is somehow an adaptation of [1] algorithm 
to solve knapsack problems. To do this job, one case follows the same procedure 
as described in the previous section. More clearly, this algorithm includes a general 
projection onto constraint space which should be replaced by our problem-specific 
projection step. Moreover, this algorithm uses a general null-space manipulation of 
active constraints which again should be replaced by our efficient problem-specific 
null-space method. 

Our adapted algorithm here will be almost identical to Algorithm 3.1 of [1] with 
this difference that the partial projection step in [1] should be replaced by our exact 
projection step (the parameter S in this algorithm should be replaced by oo). The 
convergence theories of our algorithm will be in fact the simplified version of that of 
[1]. For more details, interested readers are referred to [4, 1] and also our Fortran 
implementation of method which is freely available from web. 

6. Application to topology optimization problems. In this section we ap- 
ply the introduced method to solve topology optimization problems, cf. [3]. As a 
model problem we will solve the problem already considered in [12]. The description 
of this problem is as follows: considering two isotropic conducting materials, with 
thermal conductivities ka and A:^ , < /cq < fc^ , in a simply connected design domain 
H, C M.'^{d = 2, 3); the goal is mixing these materials with a fixed ratio in fi, such that 
the total temperature gradient in Q is minimized under the thermal load / G i^(r2), 
more precisely: 



argmiutu J{w) — ^ J^ \79 ■ \79 dx, 
subject to: 



(P) { 



-V • {k{w)ve) ^ /(x) in n 

6i(x) = 6*0 (x) on on 

k(w) = wkp -f (1 — w)ka in Q 

/j^u)dx= R\ni 0<i?<l, Os^w^l 
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where 9 e _ff^(J7) is the state variable and w G LF'{Vt) is the control parameter 
(topology indicator field). By straightforward derivation, the first order optimality 
condition for problem P can be written in the following form: 



V • (fcHve) = 


/(x) 


in 


n 


0(x) = 


00 (x) 


on 


on 


V • (fc(u;)Vr/) = 


-V • (V0) 


in 


n 


ry(x) = 





on 


on 


k{w) = 


wkp + (1 - w)ka 


in 


n 


VvAG{^)) = 





in 


n 


G(x) = 


-{k^-k^)V9-V^ 


in 


n 



[OC) 



where 77 e Hq{^) is the adjoint state, G is the L^ gradient of the objective func- 
tional with respect to w and 7'p^(u) denotes the L^ projection of function u onto the 
admissible set I?x, 

Vi = {wC,L^{VL)\ f w{x)dx = R\n\, 0<i?<l, Os^ws^l} 
Jn 

Consider the finite-dimensional counterpart of problem (P) which is resulted by 
discretization of il into an n control volumes. Also assume that the state variables and 
design parameter are defined at the center of each control volume. Under these as- 
sumptions, the admissible design domain, Vj forms a simplex in M" which is identical 
to the knapsack constraint. If we solve the state PDE at every stage of optimization 
(for a specific value of w), then the above optimization problem will be equivalent to 
problem (1.1). Note that vector a in (1.2) is equivalent to the volume of each control 
in this model problem. 

Since it is not easy to construct the above topology optimization problem such 
that it ensures pre-requirement of extended Hager-Zhang active set algorithm, the 
extended Birgin-Martinez active set algorithm will be used here. Moreover, it is 
worth mentioning that in all of our experience with these algorithm (solving similar 
optimization problems), the extended Birgin-Martinez perforins superior. We think 
that main reasons for this observation is the inaccuracy of objective function and 
its gradient in our experience besides to our failure to adjust the algorithm's control 
appropriately. 

To do numerical experiment we solve problem (P) in two and three dimensions for 
fi = [0, 1]^ and fi = [0, 1]^ respectively. The spatial domain is divided into 127^ and 
31^ control volumes in two and three dimensions respectively. In all of experiments 
here ka = 1, kp — 2, /(x) = 1 and R — 0.4. The governing PDE are solved by cell 
centered finite volume method using central difference scheme and related system of 
linear equations are solved by preconditioned conjugate gradient method with relative 
convergence threshold l.e — 20. As an stopping criteria, the optimization is discarded 
when the relative variation of topology becomes bellow l.e — 3. Notice that using finite 
volume method we do not observe any topological instability problem (checkerboard 
pattern) which is common in topology optimization problems using finite element 
method. 

Results of our numerical experiments includes the variation of the objective func- 
tional during optimization cycles and final resulted topology (w- field), are shown in 
figures 6.1 and 6.2. Plots obviously show the success of the presented algorithm to 
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solve topology optimization problems. Note that in our procedure the PDE constraint 
is satisfied upto the accuracy of method used to solve PDE and knapsack constraint is 
satisfied upto the machine precision. It is worth mentioning that we already applied 
the presented method successfully to vast families of topology optimization problems 
which are not included here due to the space constraint. 



Fig. 6.1. Variation of scaled objective function during optimization cycles for 2D (left) and 3D 
(right) examples. 
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Fig. 6.2. Final distribution of material field, w, for 2D (left) and 3D (right) examples. 



7. Summary. In the present study, a general framework is suggested to adapt 
bound-constrained optimization solvers to solve optimization problems with bounds 
and an additional linear constraint (in the form of either equality or bilateral inequal- 
ity). This framework includes two main ingredients which are the projection of an 
arbitrary point onto the feasible set and null-space manipulation of the related linear 
constraint. The implementation of the method with specific focus on the Hager- 
Zhang active set algorithm [20] is discussed in details. Moreover, we briefly discuss 
the adaptation of Birgin-Martinez active set algorithm [4] to there problems which is 
structurally very similar to Hager-Zhang active set algorithm. 
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It seems that, following the presented approach, it should be possible to perform 
such an extension for alternative box-constrained optimization methods. Notice that 
this extension can be much simpler and straightforward for some box-constrained 
solvers. For instance, using just the projection operator introduced in this study, it 
is natural to extend Newton method of [24] or affine-scaling interior-point Newton 
method of [21] for this purpose without further difficulties. 
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